Exploring the Combinatorics of Motif Alignments Foraccurately Computing E-values from P-values
نویسنده
چکیده
In biological and biomedical research motif finding tools are important in locating regulatory elements in DNA sequences. There are many such motif finding tools available, which often yield position weight matrices and significance indicators. These indicators, p-values and E-values, describe the likelihood that a motif alignment is generated by the background process, and the expected number of occurrences of the motif in the data set, respectively. The various tools often estimate these indicators differently, making them not directly comparable. One approach for comparing motifs from different tools, is computing the E-value as the product of the p-value and the number of possible alignments in the data set. In this paper we explore the combinatorics of the motif alignment models OOPS, ZOOPS, and ANR, and propose a generic algorithm for computing the number of possible combinations accurately. We also show that using the wrong alignment model can give E-values that significantly diverge from their true values. Keywords—Motif alignment, combinatorics, p-value, E-value, OOPS, ZOOPS, ANR.
منابع مشابه
Application of Soft Computing Methods for the Estimation of Roadheader Performance from Schmidt Hammer Rebound Values
Estimation of roadheader performance is one of the main topics in determining the economics of underground excavation projects. The poor performance estimation of roadheader scan leads to costly contractual claims. In this paper, the application of soft computing methods for data analysis called adaptive neuro-fuzzy inference system- subtractive clustering method (ANFIS-SCM) and artificial neu...
متن کاملEstimation of Reference Values of Biochemical Parameters Exploring the Renal Function in Adults in Ngaoundere, Cameroon
Clinical examinations are accompanied by biological analyzes to guide or confirm the clinical diagnosis. The results of these analyzes are interpreted by comparison with reference values. Studies on the biological norms of Africans are rare if not quasi-nonexistent. The aim of this study was to establish population-specific reference values for biochemical indices serving as renal function biom...
متن کاملSome remarks on the sum of the inverse values of the normalized signless Laplacian eigenvalues of graphs
Let G=(V,E), $V={v_1,v_2,ldots,v_n}$, be a simple connected graph with $%n$ vertices, $m$ edges and a sequence of vertex degrees $d_1geqd_2geqcdotsgeq d_n>0$, $d_i=d(v_i)$. Let ${A}=(a_{ij})_{ntimes n}$ and ${%D}=mathrm{diag }(d_1,d_2,ldots , d_n)$ be the adjacency and the diagonaldegree matrix of $G$, respectively. Denote by ${mathcal{L}^+}(G)={D}^{-1/2}(D+A) {D}^{-1/2}$ the normalized signles...
متن کاملOn the signed Roman edge k-domination in graphs
Let $kgeq 1$ be an integer, and $G=(V,E)$ be a finite and simplegraph. The closed neighborhood $N_G[e]$ of an edge $e$ in a graph$G$ is the set consisting of $e$ and all edges having a commonend-vertex with $e$. A signed Roman edge $k$-dominating function(SREkDF) on a graph $G$ is a function $f:E rightarrow{-1,1,2}$ satisfying the conditions that (i) for every edge $e$of $G$, $sum _{xin N[e]} f...
متن کاملBeyond the E-Value: Stratified Statistics for Protein Domain Prediction
E-values have been the dominant statistic for protein sequence analysis for the past two decades: from identifying statistically significant local sequence alignments to evaluating matches to hidden Markov models describing protein domain families. Here we formally show that for "stratified" multiple hypothesis testing problems-that is, those in which statistical tests can be partitioned natura...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009